Oe NE 
aaa 
ASA 
ees CL 


MAINFRAME SAS ENHANCEMENTS IN THE SUPPORT 
OF EXPLORATORY DATA ANALYSIS 


by 


Richard Johnson and Jane F. Gentleman 


No. 24 


Statistics Canada 
Analytical Studies Branch 


Research 


Paper Series 


avd 
Statisti Statisti 
Bef St2tstes Statistique Canada 


MAINFRAME SAS ENHANCEMENTS IN THE SUPPORT 
OF EXPLORATORY DATA ANALYSIS 


by 
Richard Johnson and Jane F. Gentleman 


No. 24 


Social and Economic Studies Division 
Analytical Studies Branch 
Statistics Canada 
1989 


The analysis presented in this paper is the responsibility of the authors and 
does not necessarily represent the views or policies of Statistics Canada. 


Aussi disponible en francais 


entelel® aelbull sheeecsd bas Je loot 
feaa7ve geliusdt. least rviena 
nhane) aoltalsage . 
684 | 


. 


RET ii ea Sat ast 


Mainframe SAS Enhancements in Support of Exploratory Data Analysis 


by Richard Johnson and Jane F. Gentleman 
ABSTRACT 


This document is a manual describing computer software developed for 
exploratory data analysis at Statistics Canada. The new software 

comprises a collection of SAS functions and macros, with heavy emphasis 

on graphics. Together with some functions already available in the SAS 
System, these routines perform the following types of operations: 

evaluation of probability density functions (PDF's), cumulative distribution 
functions (CDF's), and inverse CDF's for nine distributions; calculation of 
sample quantiles; generation of random numbers; graphing of histograms with 
optional PDF superimposition; calculation of empirical CDF's with optional 
graphing and optional CDF superimposition; and graphing of Q-Q and P-P 
plots comparing a sample to any of the nine distributions. Detailed 
instructions are provided for using the software to construct Q-Q plots 
comparing two samples. 
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Foreword 


This document is a manual describing computer software for exploratory data analysis, 
the results of a joint effort between members of the Informatics Branch and the Analytical 
Studies Branch at Statistics Canada. The manual is being distributed by the Informatics 
Branch to the computer user community within Statistics Canada, and it also appears here 
as a paper in the Analytical Studies Branch Research Paper Series. It is being re-published 
under the latter auspices because the Analytical Studies Branch Research Paper Series is 
intended to represent the broad array of activities being carried out within the Branch, 
including the production of research papers eventually destined for refereed journal 
publication, as well as other types of activities which support research and analysis. 

The software described herein was originally developed for use by students in a data 
analysis class offered by Statistics Canada as part of its on-going effort to increase data 
analytic capability within the agency. The software is now available for general use by 
Statistics Canada personnel. The Informatics Branch will release future updates of the 
manual as the software is further enhanced. 

Future enhancements for this software will include incorporation of weights for 
sample data and additional distributions. 

Following the manual are examples of graphs produced by the macros. 

It is not the intention of the Informatics Branch to distribute the software outside 
Statistics Canada. However, those who wish may request a photocopy of the new source 
code from the second author. 

Steven Earwaker coordinated and managed the data analysis class for which this 


software was produced. Louise Bergeron provided expert programming assistance. 


Richard Johnson, Informatics Branch 


Jane F. Gentleman, Analytical Studies Branch 
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PREFACE 


This document describes SAS functions and macros developed specifically in support of the 
pilot presentation of Statistics Canada course 04181! entitled “The Art of Data Analysis”. 
These facilities have been implemented on the Statistics Canada mainframe computer system. 
It is assumed that the reader is generally familiar with the mainframe operating environment 
and SAS Version 5 as configured for that environment. 


Eighteen functions were written to supplement those already available in the SAS system. 
When viewed together with nine of the original SAS functions, they constitute a collection of 
routines to evaluate three types of statistical probability functions: probability density func- 
tions (PDF’s), cumulative distribution functions (CDF'’s), and inverse cumulative distribution 
functions. These theoretical functions are useful tools in data analysis. In addition, macros 
have been developed to perform more complex operations involving both the theoretical distri- 
butions and data. In the following list, some uses for the theoretical functions are given, and 
operations for which macros have been developed are identified. 


e Plots of PDF’s can be compared to appropriately constructed histograms. A SAS macro 
has been developed to produce superimposed plots of this nature. 


e CDF’s are useful for calculating significance levels (P-values), calculating certain 
goodness-of-fit test statistics, calculating coordinates for P-P plots, and comparing theo- 
retical to empirical cumulative distribution functions (ECDI'’s). A SAS macro has been 
developed to compute and plot ECDF’s and optionally superimpose CDF’s. Another 
macro has been developed to produce P-P plots. 


e Inverse CDI'’s can be used to calculate theoretical quantiles, calculate coordinates for 
Q-Q plots, and generate random numbers. A SAS macro has been developed to produce 


Q-Q plots. 


e A SAS macro for calculating sample quantiles has been developed. This is useful for ana- 
lyzing the distribution of a sample of data and for calculating coordinates for Q-Q plots. 


e SAS macros have been developed, based on available SAS random number functions, to 
facilitate the generation of random numbers for five distribution types. 


Sections | and 2 of this document provide detailed descriptions of how to use the SAS 
functions and macros. Section 3 describes the minimal operational considerations associated 
with the production of graphics. An Appendix contains formulas for the probability density 
functions. 


The pilot version of “The Art of Data Analysis” was designed and presented by Dr. Jane 
Gentleman of Social and Economic Studies Division, Analvtical Studies Branch. Coordination 


ae 


and logistics were managed by Stephen Earwaker of Social Survey Methods Division, Method- 
ology Branch. 


This document and the special SAS facilities described hercin were created by the SAS 
support staff of Informatics Branch, Statistics Canada. The author of this document wishes to 
acknowledge: the University of Waterloo, upon whose I’ortran algorithms were based some of 
the SAS functions written in support of the course; Dr. Gentleman and Stephen Earwaker for 
valuable advice provided throughout the software development process; and Dr. Gentleman 
for assistance in the writing and editing of this document. 


At this early stage in the anticipated life of the functions and macros, it is possible that 
some errors may be present in the routines or in this document. Please notify the designated 
SAS Software Representative in Informatics Branch of any suspected errors. (The SAS help 
library member SITEINFO contains current information regarding SAS support contacts.) 
Specifications for the use of these facilities may change with continued development and 
refinement. 
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Section 1 
PROBABILITY FUNCTIONS 


The SAS system provides a variety of probability functions as documented in Chapter 6 of 
“SAS User's Guide: Basics, Version 5 Edition”. Numerous additional functions have been 
written, using VS Fortran, to complement those provided by SAS and complete a set viewed 
as desirable to meet the objectives of the course “The Art of Data Analysis”. 


1i7 PDF’s, CDF’s and Inverse CDF’s for Nine Distributions 


The comprehensive set of relevant functions consists of three functions for each of nine distri- 
bution types. The functions are: the probability density function (PDF); the cumulative dis- 
tribution function (CDF); and the inverse of the cumulative distribution function (inverse 
CDF). Table | lists the nine distributions covered, gives the name of each function, and indi- 
cates whether the function is an original SAS function or was written at Statistics Canada. 


This document describes the functions written at Statistics Canada. “SAS User’s Guide: 
Basics” is the appropriate reference for the original SAS functions. Like SAS’s own functions, 
the locally written functions are routines that return a value computed from one or more argu- 
ments. They are used in the context of a DATA step and, typically, are executed with each 
iteration of the DATA step, that is, as the DATA step processes each observation in a SAS 
data set. Arguments to the functions documented bclow are positional and mandatory. They 
are subjected to range validation. If invalid or missing arguments are detected, messages will 
be written to the SAS log and results will be set to missing. 


Table 1: A Comprehensive List of Probability Functions 


ree UNCC O10) aoe 


Distribution Parameters PDF CDF Inverse CDF 
Normal mu,sigsa PDFNORM PROBNORM '’©  proBit |’ 
Uniform ayb PDFUNI PROBUNI UNI INV 
Exponential theta PDFEXP PROBEXP EXPINV 

t df PDFT PROBT ! TINV 

Chi square df PDFCHI PROBCHI | CHIINV 

FE df1,df2 PDFF PROBF | FINV 
Weibull lo,»sc,sh PDFWEI PROBWEI WETINV 
Gamma lo,sc,sh PDFGAM propcam '’¢ = GAMINV !7° 
Beta a,b PDFBETA PROBBETA BETAINV ! 
Notes: 


\, 


2; 


Original SAS function. (All others written at Statistics Canada.) 


PROBNORM(.x) is the CDF of a N(0,l) random variable at argument x. Use 
PROBNORM((x—)/o) for the CDF of a N(j1.07) random variable at argument x 
(where » = mu and o? = sigsq). 


PROBIT(P) is the inverse CDF of a N(0,1) random variable at argument P. Use 
(o)PROBIT(P)+ for the inverse CDF of a N(ut,.o°) random variable at argument 
P (where p = mu and o? = sigsq). 


PROBGAM(-x,sh) is the CDF of a Gamma(0,I,sh) random variable at argument 
x. Use PROBGAM((x-/o)/sc,sh) for the CDF of a Gamma(lo,sc,sh) random 
variable at argument x. 


GAMINV(P,sh) 1s the inverse CDI" of a Gamma(0,I,sh) random variable at 
argument P. Use (sc)GAMINV(P,sh)+lo for the inverse CDF of a Gam- 
ma(/o,sc,sh) random variable at argument ?. 


1.2 Probability Density Functions Written at Statistics Canada 


122s ee rOEBETA 


The PDFBETA function returns the probability density at argument x of a beta distribution 
with parameters a and 5. 


General form: 


PDFBETA(x,a,b) 
where 
x The value at which the probability density is to be evaluated. 0 < x < |. 
a Spare parameter. a > 0. 
b Second shape parameter. b > 0. 
1.2.2 PDFCHI 


The PDFCHI function returns the probability density at argument x of a Chi-square distribu- 
tion with df degrees of freedom. 


General form: 


PDFCHI(x,df) 
where 
x The vaiue at which the probability densitv is to be evaluated. x > 0. 
df Number of degrees of freedom. df 2 .5. Argument df need not be an integer. 
1.2.3 PDFEXP 


The PDFEXP function returns the probability densitv at argument x. of an exponential distri- 
bution with mean theta. 


General form: 


PDFEXP(x, theta) 
where 
Xx The value at which the probability densitv is to be evaluated. x = 0. 
theta Mean theta. Theta > 0. 


122'4=PDEE 


The PDFF function returns the probability density at argument x of an F distribution with df/ 
and d/2 degrees of freedom. 


General form: 


PDFF(x,df1,df2) 


where 

Xx The value at which the probability density is to be evaluated. x > 0. 

df Numerator degrees of freedom. d/l > 0. Argument d/f/ need not be an integer. 
df2 Denominator degrees of freedom. df2 > 0. Argument df2 need not be an inte- 


ger. 


1.25 PDFGAM 


The PDFGAM function returns the probability density at argument x of a gamma distribution 
with the given location, scale and shape parameters. 


General form: 


PDFGAM(x,1lo,sc,sh) 
where 
Xx The value at which the probability density is to be evaluated. x > lo. 
lo Location parameter. —© < lo < om, 
sc Scale parameter. sc > 0. 
sh Shape parameter. sh > 0. 


1.26 PDFNORM 


The PDFNORM function returns the probability density at argument x« of a Normal distribu- 
tion with mean mu and variance sigsq. 


General form: 
PDFNORM(x,mu,sigsq) 
where 
Xx The value at which the probability density is to be evaluated. —co < x < o, 


Px 


mu Mean (pp). —© <p < 0, 


sigsq Variance (o’). o > 0. 


1.2.7 PDFT 


The PDFT function returns the probability density at argument x of a t distribution with df 
degrees of freedom. 


General form: 


PDFT(x,df) 
where 
Xx The value at which the probability density is to be evaluated. —0 < x < o. 
df Number of degrees of freedom. df > 0. Argument df need not be an integer. 
1.2.8 PDFUNI 


The PDFUNT function returns the probability density at argument x of an uniform distribu- 
tion on the interval [7,4]. 


General form: 


PDFUNI(x,a,b) 
where 
Xx The value at which the probability density ts to be evaluated. PIES Go Gk ted eb 
a Lower limit of the interval. a < /. 
b Upper limit of the interval. a < b. 
1.2.9 PDFWEI 


The PDFWEI function returns the probability density at argument « of a Weibull distribution 
with the given location, scale and shape parameters. 


General form: 
PDFWEI(x,lo,sc»,sh) 


where 


x The value at which the probability density is to be evaluated. x > lo. 


lo Location parameter. —© < lo < ©. 
sc Scale parameter. sc > 0. 
sh Shape parameter. sh > 0). 


1.3. Cumulative Distribution Functions Written at Statistics Canada 


1.3.1 PROBEXP 


The PROBEXP function returns the probability that a random variable having the exponential 
distribution with mean theta is less than or equal to the input argument x. 


General form: 


PROBEXP(x, theta) 
where 
x The vaiue at which the function is to be evaluated. x = 0. 
theta Mean ineta. Theta > 0. 


1.3.2 PROBUNI 


The PROBUNI function returns the probability that a random variable having the uniform 
distribution on the interval (a,b) is less than or equal to the input argument -. 


General form: 


PROBUNI(x,a,b) 
where 
x The value at which the function is to be evaluated. a < x < b. 
a Lower limit of the interval. a < Ab. 


b Upper limit of the interval. a < A. 


Gs i 


1.3.3 PROBWEl 

The PROBWEI function returns the probability that a random variable having the Weibull 
distribution with the given location, scale and shape parameters is less than or equal to the 
input argument x. 


General form: 


PROBWEI(x,10,sc,sh) 


where 

x The value at which the function is to be evaluated. x > lo. 
lo Location parameter. — < lo < o., 

sc Scale parameter. sc > 0. 

sh Shape parameter. sh > 0. 


1.4 Inverse Cumulative Distribution Functions Written at Statistics Canada 


1.4.1 CHIINV 


The CHIINV function returns the Chi-square value x, such that a random variable, distribut- 
ed as Chi-square with df degrees of freedom, is less than or equal to x with probability p. 


General form: 


CHIINV(p,df ) 
where 
p Probability in range [0,1]. 
df Number of degrees of freedom. df 2 .5. Argument df need not be an integer. 
1.4.2 EXPINV 


The EXPINV function returns the exponential value x, such that a random variable, distribut- 
ed as exponential with mean thera, is less than or equal to with probability p. 


General form: 
EXPINV(p, theta) 


where 


p Probability in range [0,1]. 


theta Mean theta. Theta > 0. 


1.4.3 FINV 


The FINV function returns the F value x, such that a random variable, distributed as F with 
dfl and df2 degrees of freedom, is less than or equal to + with probability p. 


General form: 


FINV(p,dfl,df2) 


where 

p Probability in range [0,1]. 

df Numerator degrees of freedom. dfl > 0. Argument df/ need not be an integer. 

df2 Denominator degrees of freedom. d/2 > 0. Argument df2 need not be an inte- 
ger. 

1.4.4 TINV 


The TINV function returns the t value x, such that a random variable, distributed as t with df 
degrees of freedom, is less than or equal to x with probability p. 


General form: 


TINV(p,dFf) 
where 
p Probability in range [0,1]. 
df Number of degrees of freedom. df > 0. Argument d/ need not be an integer. 
1.4.5 UNIINV 


The UNIINV function returns the uniform value x, such that a random variable, distributed 
as uniform on the interval (a,), is less than or equal to + with probability p. 


General form: 
UNIINV(p,a,b) 


where 


p Probability in range [0,1]. 


a Lower limit of the interval. a < hb. 
b Upper limit of the interval. a < 6. 
1.4.6  WEIINV 


The WEIINV function returns the Weibull value +, such that a random variable, distributed 
as Weibull with the given /ocation, scale and shape parameters, 1s less than or equal to x with 
probability p. 


General form: 


WETINV(p,1lossc,sh) 
where 
p . Probability in range [0,1]. 
lo Location parameter. —0 < lo < ©, 
sc Scale parameter. sc > 0. 
sh Shape parameter. sh > 0. 
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Section 2 
MACROS 


This section describes SAS macros written to perform operations on entire SAS data sets. The 
macros consist of macro statements, and data step programming statements and/or complete 
DATA and PROC steps. All of the macros were written for name-style macro calls; that is, 
the form of the invocation 1s %macroname(parameters) as described in Chapter 19 of “SAS 


User’s Guide: Basics, Version 5 Edition”. 


Several of the macros described in this section produce plots. All plots are produced in 
landscape orientation on the IBM 3800-3 laser page printer. This device has been chosen 
because it is available to all mainframe SAS users, and also because plots sent directly to the 
3800-3 are produced in the preferred landscape orientation. (By comparison, landscape orien- 
tation of graphs on a cut-sheet 3820 laser printer requires creation of a graphics catalog, fol- 
lowed by template replay to effect 90-degree rotation.) 


Other graphics devices can be used with the macros described in this section only by mod- 
ifying the GOPTIONS statement in the macro source code. I’or occasional needs, this can 
more easily be done on-line under Display Manager where the macro can be brought into the 
editor screen with an INCLUDE INCL(macroname) command (where macroname is the name 
of the macro but without the leading % character), modificd and SUBMITted. Once a modi- 
fied macro has been submitted from the editor screen, that version will take precedence over 
the original version in the default autocall library whenever that macro is invoked subsequent- 
ly in the SAS session. In batch mode, the user must create a modified copy of the macro ina 
user autocall library and then specify the root of that library to the INCL parameter of the 
JCL procedure. The batch approach can also be used on-line and is appropriate for regular 
use of an alternative graphics device such as a pen plotter or graphics display terminal. Please 
note that some of the plotting macros described below have subordinate macros which do the 
actual plotting and therefore contain the GOPTIONS statement. In the macro descriptions 
which follow (in the narrative portion preceding the detailed specifications), the name of the 
macro which contains the GOPTIONS statement 1s given. 


By default, SAS/GRAPIT will automatically scale the axes of a plot by taking into consid- 
eration the ranges of values of the variables being plotted, the character cell alignment of tick- 
mark values on the axes, and space required for user-specificd titles and footnotes. If several 
data samples having differing value ranges are plottcd in this way, the axes of the plots will 
have different limits'and therefore may be given different Iengths cven though titling may be 
consistent and the same graphics device is being used. Such differences in the plots may hind- 
er comparison of the samples. Fortunately, SAS/GRAPIT provides the means to force axis 


| In this discussion of axes, the term /engrh will refer to the physical length (measurable in inches or centimeters), 
and the term /imits will refer to the numerical length reflected by the tick-mark values. 


iis 


consistency so that plots of differing samples can be compared. 


Thus, each of the plotting macros can operate in either of two modes regarding axis 
length and limits, depending on the use of available parameters to the macros. Axis length 
can be determined by SAS (parameter AXES= SAS or no AXES parameter specified) or can 
be fixed at a predetermined percentage? of the dimensions of the total plotting surface 
(AXES= FIXED). Fixed mode imposes restrictions on the use of titles and footnotes, the 
restrictions varying with the choice of graphics device because of differing character cell 
dimensions. On the 3800-3 used by the macros, titles and footnotes of default height can be 
used in 2-and-1 combinations: either 2 titles and one footnote, or 2 footnotes and one title. 
With no footnotes, a maximum of four titles of default height can be used. Heights other 
than the defaults may be used on a trial-and-error basis. 


The axis limits can be allowed to default to those of the relevant variable, or can be speci- 
fied via XAXIS and YAXIS parameters. Values specified for the XAXIS and YAXIS parame- 
ters can correspond to any of the forms values can take for the ORDER option of the 
SAS/GRAPH AXIS statement. Values can be specified as a list (e.g., XAXIS=1 3 5 7 9), as 
a range (e.g, XAXIS=I1 9 or XAXIS=I1 TO 9), as a range with an increment (e.g., 
XAXIS=1 TO 9 BY 2), or as a combination of any of these forms. In the context of these 
macros, specification of the parameter value as a range with an increment will generally be the 
most appropriate. In this case, SAS will attempt to annotate the axis tick-marks with the 
specified incremental values. Regardless of the method used to specify axis values, the values 
will always be evenly spaced along the relevant axis. 


2.1. Histograms with Optional PDF Superimposition 


A macro has been written to produce a histogram wherein the widths of the vertical bars can 
vary in accordance with user-specified boundaries. The macro will also create an output SAS 
data set having one observation for each histogram interval and containing variables for lower 
interval boundary, upper interval boundary, frequency and bar height. Optionally, the macro 
will superimpose a user-specified theorctical probability density function (PDF) between user- 
specified lower and uprer limits. The graph will be produced on the 3800-3 laser printer. To 
use an alternative graphics device, it is necessary to modify the GOPTIONS statement con- 
tained in the %HIST macro. (See the general information at the beginning of this section.) 


In order to scale the histogram properly so that a PDI’ can be superimposed, the bar 
heights are calculated as follows: 


number of data values in the interval 
(interval width)( total number of data values) 


bar height = 
Thus, like a PDF, the total area of the histogram is one. 
Parameters must be provided to identify the input SAS data set and to specify interval 


boundaries for the histogram. Any data value which is equal to a boundary between two bars 
is counted as being in the higher interval. A data value equal to the lowest/highest boundary 


2 The percentage is fixed in the macro code and cannot be modified by the user without macro modification. 
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is counted as being in the leftmost/rightmost interval. If any data values fall outside the outer 
histogram boundaries specified by the user, the macro will generate additional lower and/or 
upper histogram intervals to accommodate the data. 


If the appropriate parameter is provided to select a PDI’, then a graph of that function 
will be superimposed on the histogram. In this case, the user has the option of specifying low- 
er and upper plotting limits for the PDF via another parameter, or letting the minimum and 
maximum sample values be used by default. 


Macro parameters are in the form keyword= value and can be specified in any order. 


2.1.1 %HIST Macro 
General form: 
ASI list) 
Parameters: 


iN the name of the input sample SAS data set for which a histogram is to be pro- 
duced. This parameter is required. If it is not specified, a message will be writ- 
ten to the SAS log and the macro will terminate. 


INVAR= the name of the variable, in the input SAS data set identified by the IN= 
parameter, for which a histogram ts to be produced. If this parameter ts not 
specified, the variable name X will be assumed. 


BNDS= the interval boundaries for the histogram. Specify the lower bound for each 
bar, proceeding from left to right, and terminate the list with the upper bound 
of the rightmost bar. (There are no gaps between bars.) Separate the boundary 
values by at least one blank. This parameter ts required. If it ts not specified, a 
message will be written to the SAS log and the macro will terminate. 


AXES = the method used to determine the Iengths of the axes for the histogram. 
AXES=SAS_ will allow SAS to determine the lengths of the axes. 
AXES= FIXED will fix the axes at a predetermined percentage of the total 
plotting surface dimensions. If this parameter is not specified, AXES=SAS will 
be assumed. (See also the discussion of axis length at the beginning of this sec- 
tion.) 


XAXIS = the X-axis limits and intermediate tick-mark values. This parameter 1s optional. 
It is less likely to be used in this macro than in other plotting macros. This is 
because %oEIIST simulates a histogram through a plotting technique. As a 
result, SAS/GRAPHI provides evenly spaced X-axis tick-marks that do not nec- 
essarily correspond to user specified interval boundaries. (PROC GPLOT 
doesn’t know that it is drawing a histogram.) If XAXIS values corresponding 
to interval boundaries are specified, the boundaries will be evenly spaced along 
the X-axis, thereby defeating the purpose of the macro if the intervals are not of 
equal width. (See the discussion of axis valucs at the beginning of this section 
for details on the syntax of this parameter.) 
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YAXIS = 


FUNC = 


LIMITS = 


OUT= 


the Y-axis limits and intermediate tick-mark values. This parameter is optional. 
If used, its value can correspond to any of the forms values can take for the 
ORDER option of the SAS/GRAPII AXIS statement. (See the discussion of 
axis values at the beginning of this section for details on the syntax of this 
parameter.) 


the PDF specification which will result in a theoretical curve being superim- 
posed on the histogram. This parameter is optional. If it is used it must be 
specified in the form FUNC = function-name(arguments), where function-name is 
the name of a PDF described in Section 1.2 of this document, and arguments 1s 
the argument list required by the chosen function. The first element in the 
function’s argument list, which is the name of the variable at which the function 
is to be evaluated, must always be coded as X. (This is not the same X as the 
default INVAR variable.) 


the minimum and maximum values of X (argument to the PDF) at which the 
PDF will be evaluated. (The macro will generate 200 values of X, evenly dis- 
tributed between the limits. The PDI’ will be evaluated and plotted as 200 
points connected by straight lines.) The limits must be given in the order 
LIMITS=lower upper with at least one blank separator between the limits. 
This parameter will be used only if FUNC = has been specified. If FUNC= 1s 
specified and LIMITS= is not specified, then the minimum and maximum sam- 
ple values will be used by default. If the lower limit specified for the PDF 1s 
greater than the minimum sample value, then the latter will be used instead as 
the lower limit. Similarly, if the upper limit specified for the PDF is less than 
the maximum sample value, then the latter will be used instead as the upper 
limit. 


the name of the output SAS data set which will contain information about the 
intervals of the histogram. The data sct will contain one observation for each 
interval. The observations will contain the following variables: 


LOWER the lower boundary for an interval of the histogram. 
UPPER the upper boundary for an interval of the histogram. 
FREQ the frequency with which the value range described by the 


boundaries occurs tn the input sample. 


HEIGHT the height of the histogram bar which depicts the current inter- 
val. 


This parameter is required in order to obtain an output data set. If it is not 
specified, no output data set will be produced. 


The following example will produce a histogram of the variable SAMPL in SAS data set 
WORK.NORM using interval boundaries as specified. The histogram will be plotted as 9 bars 
beginning at -2.5 and ending at 3.5. It will be overlayed with a plot of the function 
PDFNORM for 200 values evenly distributed between -3.6 and 3.6. Arguments to 
PDFNORM are: the mandatory variable name X, mean 0, and variance 1. An output SAS 
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data set named HISTINFO will be created. It will contain nine observations, one for each 
interval. 


%HISTCIN=NORM, INVAR=SAMPL » 
BN re ee a eee OO ale 1 Sig 
FUNC=PDFNORM(X,0,1),LIMITS=-3.6 3.6, 
OUT=HISTINFO) 


2.2. ECDF’s with Optional Plotting and Optional CDF Superimposition 


A macro has been written to calculate the empirical cumulative distribution function (ECDF) 
evaluated at all the ordered observations of a designated variable. The ECDF is defined as 
follows. If x,.), Xi, ++) X;,) are the sample values sorted in non-descending order, then the value 


of the ECDF at argument Xi iS ))/ 


The results of the calculation are placed in a new variable which is written to an output 
data set along with all input variables. An input data set must be specified in the parameter 
list. If an output data set is not specified, the input data set will be reused. It should be not- 
ed that, as a result of a sort step in the macro, the output data set will be produced in ascend- 
ing sequence of the input variable. If the original sequence of the input data set must be 
retained, it is necessary to specify an output data set name in the parameter list. 


Optionally, a plot of the ECDF will be produced on the 3800-3 laser printer. In order to 
use an alternative graphics device, it is necessary to modify the GOPTIONS statement con- 
tained in the %ECDEPLT macro called by %ECDIF. (See the general information at the 
beginning of this section.) If an ECDF plot is requested, a user-specified theoretical cumula- 
tive distribution function (CDF) can optionally be superimposed on the plot of the ECDF by 
specifying the appropriate parameter to select a CDF’. In this case, the user has the option of 
specifying lower and upper plotting limits for the CDI via another parameter, or letting the 
minimum and maximum sample values be used by default. 


Macro parameters are in the form keyword= value and can be specified in any order. 


2.2.1 %ECDF Macro 


General form: 


%ECDF (parameter list) 


Parameters: 

IN= the name of the input SAS data set. This parameter is required. If it 1s not 
specified, a message will be written to the SAS log and the macro will termi- 
nate. 

INVAR= the name of the input variable for which the [CDI is to be calculated. If this 


parameter is not specified, the variable name X will be assumed. 
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NEWVAR= 


OUT= 


PLOT= 


AXES = 


XAXIS = 


YAXIS = 


FUNC = 


LIMITS = 


the name of the new variable which will contain the ECDF values. If this 
parameter is not specified, the name F'_IIAT will be used. 


the name of the output SAS data sect to be produced. If this parameter is not 
specified, a warning message will be written to the SAS log and the input data 
set will be reused. 


indicates whether or not an ECDF plot is to be produced on the 3800-3 laser 
printer. The default value is NO. Specify PLOT= YES to obtain a plot. 


the method used to determine the lengths of the axes for the optional plot. 
AXES=SAS will allow SAS to determine the lengths of the axes. 
AXES= FIXED will fix the axes at a predetermined percentage of the total 
plotting surface dimensions. If this parameter is not specified, AXES=SAS 
will be assumed. If PILOT=NO is in effect, this parameter will be ignored. 
(See also the discussion of axis length at the beginning of this section.) 


the X-axis limits and intermediate tick-mark values. This parameter is option- 
al. If used, its value can correspond to any of the forms values can take for 
the ORDER option of the SAS/GRAPEI AXIS statement. If PLOT=NO 1s 
in effect, this parameter will be ignored. (See the discussion of axis values at 
the beginning of this section for details on the syntax of this parameter.) 


the Y-axis limits and intermediate tick-mark values. (See XAXIS=.) 


the CDF specification which will result in a theoretical curve being superim- 
posed on the ECDF plot produced in response to the PLOT= YES parameter. 
(PLOT = YES is, therefore, a prerequisite to the use of this parameter.) This 
parameter must be given in the form I'UNC = function-name(arguments), where 
function-name is the name of a CDI’ described either in “SAS User’s Guide: 
Basics, Version 5 Edition” or in Section 1.3 of this document, and arguments is 
the argument list required by the chosen function. The first element in the 
function's argument list, which is the name of the variable at which the func- 
tion is to be evaluated, must always be coded as X. (This is not the same X 
as the default INVAR variable.) 


the minimum and maximum values of X (argument to the CDF) at which the 
CDF will be evaluated. (The macro will generate 200 values of X, evenly dis- 
tributed between the limits. The CDI’ will be evaluated and plotted as 200 
points connected by straight lines.) The limits must be given in the order 
LIMITS =l/ower upper with at Icast one blank separator between the limits. 
This parameter will be used only if ('UNC= has been specified. If FUNC= 
is specified and LIMITS= is not specified, then the minimum and maximum 
sample values will be used by default. If the lower limit specified for the CDF 
is greater than the minimum sample value, then the latter will be used instead 
as the lower limit. Similarly, if the upper limit specified for the CDF is less 
than the maximum sample value, then the latter will be used instead as the 
upper limit. 


2.3 Sample Quantiles 


A macro has been written to compute sample quantiles. The sample P-quantile Q(P) is 
defined as follows. Let x;,), x,, ..-, Xj) be the sample values sorted in non-descending order. 
Then Q(P) = x,,p,.5), Where linear interpolation is used if 1<nP+.5<n and nP+.5 is not an 
integer. If P<.5/n, then O(P) = x. If P>(n—.5)/n, then QO(P) = x, Che quantity nP+.5 
ismne. index« O1.0( 7), 
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Two SAS data sets are required as input to this macro: a sample data set containing the 
variable from which quantiles are to be computed, and a data set containing one or more 
probabilities for which quantiles are to be computed. The SAS data set containing the prob- 
abilities will consist of one observation for each probability. Parameters must be provided for 
both input data sets. The macro produces a sorted, temporary copy of the input sample data 
set; the sequence of the original is not altered. An output data set is created containing one 
observation for each observation in the data set of probabilities. Output observations will 
contain the input probability variable and computed quantile and quantile-index variables. 
Note that if no parameter is provided to name the output data set, the input data set of prob- 
abilities will be reused. In this case, all original variables will be retained and the quantile and 
quantile-index variables will be added. Macro parameters are in the form keyword= value and 
can be specified in any order. 


2.3.1 % QUANT Macro 


General form: 


*QUANTCparameter list) 


Parameters: 

IN= the name of the input sample SAS data sct from which quantiles are to be com- 
puted. This parameter is required. If it is not specified, a message will be writ- 
ten to the SAS log and the macro will terminate. 

INVAR= the name of the sample variable, in the input SAS data set identified by the 
IN= parameter, from which quantiles are to be computed. If this parameter is 
not specified, the variable name X will be assumed. 

INP= the name of the input SAS data set containing one or more probabilities for 


which quantiles are to be computed. This parameter is required. If it is not 
specified, a message will be written to the SAS log and the macro will terminate. 


INPVAR=_ the name of the variable, in the input SAS data sct identified by the INP= 
parameter, containing the probabilities for which quantiles are to be computed. 
These probabilities may be any valucs between zero and one and need not be 
sorted. If this parameter is not specificd, a message will be written to the SAS 
log and the macro will terminate. 


OUT= the name of the output SAS data set which will contain the computed quantiles. 
The data set will contain one observation for each observation in the input 
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probabilities data set. If this parameter is not specified, the input probabilities 
data set, specified by the INP= parameter, will be reused. In this case, all vari- 
ables contained in the INP= data set will be retained. 


Q= the name of the output variable which will contain the computed quantiles. If 
this parameter is not specified, the variable name Q will be used. 


QI= the name of the output variable which will contain the computed quantile indi- 
ces. If this parameter is not specified, the variable name QI will be used. 


2.4 Probability (Q-Q and P-P) Plots 


Macros have been written to construct two types of probability plots for comparing a sample 
of data to a theoretical distribution: (1) Quantile-Quantile (Q-Q) plots to compare sample 
quantiles to theoretical quantiles, and (2) Probability-Probability (P-P) plots to compare sam- 
ple cumulative proportions to theoretical cumulative probabilities. In addition, instructions 
are given below for constructing a Q-Q plot comparing two samples of data to each other. 


2.4.1 Quantile-Quantile (Q-Q) Plots Comparing a Sample to a Distribution 


Let Keys Kays ++ Xen be the sample values sorted in non-descending order. Let P,, P,,..., P, bea 
set of k probabilities selected by the user; these may be any values between zero and one and 
need not be sorted. Let Q,,Q,,...,Q, be the corresponding sample quantiles. (The Q's are 
calculated from the entire sample of # values.) 


A Q-Q plot comparing the sample to a theoretical distribution is a scatter plot of the k 
points (He (POF (for i = 1,...,&), where F' is the inverse CDF for the desired theoretical 
distribution (see Section 1). 


The values k = n and P, = (i—.5)/n are commonly used, in which case Q, = X(iy: 

The macro for Q-Q plotting will operate in one of two modes, depending on whether or 
not the user provides input probabilities. In either case, a SAS data set containing an input 
sample is required. 


If input probabilities are not provided, the PCDI will be calculated, using the %ECDF 
macro, at each observation of the input sample. The result is used as the probability argu- 
ment to an inverse CDI’ function which the user must specify as a parameter to the macro. 
Values returned by the inverse CDI’ function are used as X-coordinates for the Q-Q plot. The 
input sample values are used as Y-coordinates. 


If input probabilities are provided, the %QUANT macro is used to obtain P-quantiles 
from the the input sample and these are used as the Y-coordinates. The user-specified inverse 
CDF function operates on the same P-quantiles to produce the X-coordinates. 


An optional output SAS data set can be produced containing the coordinates for the Q-Q 
plot. The plot will be produced on the 3800-3 laser printer. In order to use an alternative 
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graphics device, it is necessary to modify the GOPTIONS statement contained in the %PORQ 
macro called by %QQ. (See the general information at the beginning of this section.) Macro 
parameters are in the form keyword= value and can be specified in any order. 


2.4.1.1 %QQ Macro 


General Form: 
%QQ(pa 
Parameters: 


IN= 


INVAR= 


INP= 


INPVAR= 


FUNC= 


AXES = 


rameter list) 


the name of the input sample SAS data set. This parameter is required. If it is 
not specified, a message will be written to the SAS log and the macro will termi- 
nate. The input sample SAS data set need not be sorted by the user. 


the name of the variable which contains the input sample. If this parameter is 
not specified, the variable name X will be assumed. 


the name of the SAS data set containing the optional input probabilities. The 
macro will operate in one of two modes depending on whether or not this 
parameter has been specified. (See text preceding the macro parameter specifi- 
cations for details.) If this parameter is uscd, the INPVAR= parameter must 
also be given. 


the name of the variable containing the input probabilities. This parameter 1s 
required if the INP= parameter has been specified. If INP= has been speci- 
fied and INPVAR= has not, a message will be written to the SAS log and the 
macro will terminate. 


the inverse CDF specification for the desired theoretical distribution. This 
parameter is required. If it is not specified, a message will be written to the 
SAS log and the macro will terminate. This function will be evaluated either at 
each of the default probability values derived from the ECDF calculation, or at 
each value of the INPVAR variable, depending on the operational mode of the 
macro, in order to produce theoretical quantiles to be used as X-axis coordi- 
nates for the Q-Q plot. This parameter must be given in the form 
FUNC = function-name(arguments), where function-name is the name of an 
inverse CDF described either in “SAS User's Guide: Basics, Version 5 Edition” 
or in Section 1.4 of this document, and argiwments is the argument list required 
by the chosen function. The first element in the function’s argument list, which 
is the name of the variable at which the fiinction ts to be evaluated, must 
always be coded as X. 


the method used to determine the lengths of the axes for the plot. AXES=SAS 
will allow SAS to determine the lengths of the axes. AXES=FIXED will fix 
the axes at a predetermined percentage of the total plotting surface dimensions. 
If this parameter is not specified, AXES=SAS will be assumed. (See also the 
discussion of axis length at the beginning of this section.) 


al 8 r- 


XAXIS = the X-axis limits and intermediate tick-mark values. This parameter is optional. 
If used, its value can correspond to any of the forms values can take for the 
ORDER option of the SAS/GRAPH AXIS statement. (See the discussion of 
axis values at the beginning of this section for details on the syntax of this 


parameter.) 
YAXIS = the Y-axis limits and intermediate tick-mark values. (See XAXIS=.) 


OUT= the name of the optional output SAS data sct which will contain the X and Y 
coordinates for the Q-Q plot. The output data set will be produced only if this 
parameter has been specified. 


QXx= the name to be given to the output variable which will contain the theoretical 
quantiles used as X-axis coordinates for the Q-Q plot. If this parameter is spec- 
ified without a value for the OUT= parameter, then it will be ignored. If 
OUT= has been specified and this parameter has not, then the default variable 
name QX will be used. 


QY= the name to be given to the output variable which will contain the sample 
quantiles used as Y-axis coordinates for the Q-Q plot. If this parameter is spec- 
ified without a value for the OUT= parameter, then it will be ignored. If 
OUT= has been specified and this parameter has not, then the default variable 
name QY will be used. 


2.4.2 Probability-Probability (P-P) Plots Comparing a Sample to a 
Distribution 


Let 2/1), X(a)) +++) Xq) be the sample values sorted in non-descending order. Let Q,, Q,,...,Q, be 
a set of k quantile values selected by the user; these need not be’sorted) Let P,P)... Piibe 
the corresponding values of the ECDF at the arguments Q. (The E.CDF 1s calculated from 
the entire sample of » values.) 


A P-P plot comparing the sample to a theoretical distribution is a scatter plot of the k 
points (F(Q,),P,) (for ¢ = 1,...,&), where F is the CDF for the desired theoretical distribution 


(see Section 1). 
The values k = n and Q, = x,) are commonly used, in which case P, = ({—.5)/n. 


The macro for P-P plotting will operate in one of two modes, depending on whether or 
not the user provides input quantiles. In either case, a SAS data set containing an input sam- 
ple is required. | 


If input quantiles are not provided, the ECDIF will be calculated, using the %ECDF mac- 
ro, at each observation of the input sample. The results will be used as the Y coordinates for 
the P-P plot. A CDF function which the user must specify as a parameter to the macro will 
also be evaluated at each observation of the input sample. Values returned by the CDF will 
be used as the X coordinates. 


If input quantiles are provided, the macro will use linear interpolation to calculate the 
corresponding probability values of the ECDF at each quantile argument and the results will 
be used as Y coordinates to the P-P plot. The user-specified CDF will be evaluated at each 
quantile argument to get the X coordinates. 


An optional output SAS data set can be produced containing the coordinates for the P-P 
plot. The plot will be produced on the 3800-3 laser printer. In order to use an alternative 
graphics device, it is necessary to modify the GOPTIONS statement contained in the %PORQ 
macro called by %PP. (See the general information at the beginning of this section.) Macro 
parameters are in the form keyword= value and can be specified in any order. 


2.4.2.1 %PP Macro 
General Form: 

“*%PP(parameter list) 
Parameters: 


IN= the name of the input sample SAS data set. This parameter is required. If it is 
not specified, a message will be written to the SAS log and the macro will termi- 
nate. The input sample SAS data set nced not be sorted by the user. 


INVAR= the name of the variable which contains the input sample. If this parameter is 
not specified, the variable name X will be assumed. 


INQ= the name of the SAS data set containing the optional input quantile values. 
The macro will operate in one of two modes depending on whether or not this 
parameter has been specified. (See text preceding the macro parameter specifi- 
cations for details.) If this parameter is used, the INQVAR= parameter must 
also be given. 


INQVAR= the name of the variable containing the input quantile values. This parameter 
is required if the INQ= parameter has been specified. If INQ= has been spec- 
ified and INQVAR= has not, a message will be written to the SAS log and the 
macro will terminate. 


FUNC= the CDF specification for the desired theoretical distribution. This parameter is 
required. If it is not specified, a message will be written to the SAS log and the 
macro will terminate. This function will be evaluated cither at each value of the 
input sample, or at each value of the INQVAR variable, depending on the 
operational mode of the macro, in order to produce theoretical probabilities to 
be used as X-axis coordinates for the P-P plot. This parameter must be given 
in the form FUNC= function-name(argumnents), where function-name is the 
name of a CDF described either in “SAS User’s Guide: Basics, Version 5 Edi- 
tion” or in Section 1.3 of this document, and arguments is the argument list 
required by the chosen function. The first clement in the function’s argument 
list, which is the name of the variable at which the function is to be evaluated, 
must always be coded as X. 


po. 


AXES = 


XAXIS = 


BY = 


the method used to determine the lengths of the axes for the plot. AXES=SAS 
will allow SAS to determine the lengths of the axes. AXES=FIXED will fix 
the axes at a predetermined percentage of the total plotting surface dimensions. 
If this parameter is not specified, AXES=SAS will be assumed. (See also the 
discussion of axis length at the beginning of this section.) 


the X-axis limits and intermediate tick-mark values. This parameter is optional. 
If used, its value can correspond to any of the forms values can take for the 
ORDER option of the SAS/GRAPH AXIS statement. (See the discussion of 
axis values at the beginning of this section for details on the syntax of this 
parameter. ) 


the Y-axis limits and intermediate tick-mark values. (See XAXIS=.) 


the name of the optional output SAS data sect which will contain the X and Y 
coordinates for the P-P plot. The output data set will be produced only if this 
parameter has been specified. 


the name to be given to the output variable which will contain the theoretical 
probabilities used as X-axis coordinates for the P-P plot. If this parameter 1s 
specified without a value for the OUT= parameter, then it will be ignored. If 
OUT= has been specified and this parameter has not, then the default variable 
name PX will be used. 


the name to be given to the output variable which will contain the sample prob- 
abilities used as Y-axis coordinates for the P-P plot. If this parameter 1s speci- 
fied without a value for the OUT= parameter, then it will be ignored. If 
OUT= has been specified and this parameter has not, then the default variable 
name PY will be used. 


2.4.3. Quantile-Quantile (Q-Q) Plots Comparing Two Samples 


Letixjppxge., Xpand yriy3,cc,yiibeitwo samplescol dataye( lIhesemneed@not be thewsarme! size: 
and they need not be sorted.) Let P,, P,,..., P, be a single set of k probabilities selected by the 


user. Let Q7,Q3,...,Q; be the corresponding quantiles for the x's, obtained by using the 


% QUANT macro, and let Q}, Q%, ....Q; be the corresponding quantiles for the y's, obtained 
by using the %QUANT macro a second time. 


Then a Q-Q plot comparing the distribution of the x's to that of the y’s is a scatter plot of 
the k points (Q7,Q7) (for i=1, ..., k). 


In the special case where n=m=k and P.=(i—.5)/n, then the Q-Q plot is simply a scatter 
plot of the ordered y’s versus the ordered x’s. 


Neh: 


2.5 Random Variate Generators 


Several macros have been written to facilitate the creation of SAS data sets containing a ran- 
dom variate. Each macro will generate a different distribution: chi-square, exponential, nor- 
mal, t, or uniform. Keyword parameters enable the user to specify the desired number of 
observations, the name of the random variate, the initial seed for the random generator func- 
tion used, and parameters of the distribution. Parameters are specified in the form key- 
word= value, and can be specified in any order. 


_ Each macro is designed to be invoked in the context of a data step for which the user sup- 
plies a DATA statement identifying the SAS data set to which the macro will output observa- 
tions. For example, 


DATA NORMO1; 
%GENNORM(N=200 ) 
RUN}; 


will produce SAS data set WORK.NORMOI containing 200 observations of normal random 
variate X with mean 0 and variance | as determined by parameter defaults. 


Each of these random variate generator macros uses one of the SAS random number 
functions (two, in the case of %GENT). In the macro descriptions which follow, pertinent 
SAS random number functions are identified. The SAS random number functions are 
described in Chapter 6 of “SAS User’s Guide: Basics, Version 5 Edition”. Techniques used to 
generate an observation are indicated therein. 

The seed parameter associated with each of these macros becomes the seed to the relevant 
SAS random number function(s). Your attention is directed to page 236 of “SAS User’s 
Guide: Basics” for a detailed discussion of the initialization of a random number stream. 


Please be advised that, if data generated by these macros are to be reproducible, an initial 
seed having a value greater than zero must be used and that initial seed value should be recorded. 
2.5.1 %GENCHI Macro 


%GENCHI generates observations of a Chi-square variate. This macro uses the RANGAM 
function as follows: 


ALPHA = degrees-of-freedom / 2; 
variate = 2 * RANGAM(seed,ALPHA); 


where degrees-of-freedom, variate and seed are macro parameters DI’, VAR and S, respective- 
ly. 


General form: 


%GENCHI (parameter list) 


Parameters: 


DOs 


DF= degrees of freedom. The default value is |. Any value specified must be an 


integer. 
N= the number of observations to be generated. The default value is 50. 
S= the initial seed for the random number function. The default value is 0 which 


causes a CPU clock observation to be used as the initial seed. A reproducible 
series of values can be obtained by using a seed >0. (See “SAS User’s Guide: 
Basics, Version 5 Edition”, page 236.) 


VAR= the name of the random variate for which values will be generated. The default 
value is X. 
2.5.2 %GENEXP Macro 


%GENEXP generates observations of an exponential variate. This macro uses the RANEXP 
function as follows: 


Variate = RANEXP(seed) * THETA; 
where variate, seed and THETA are macro parameters VAR, S and THETA, respectively. 
General form: 


*%GENEXP( parameter list) 


Parameters: 

THETA= the mean. The default value is 1. 

N= the number of observations to be generated. The default value is 50. 

S= the initta! seed for the random number function. The default value is 0 which 
causes a CPU clock observation to be used as the initial seed. A reproducible 
series of values can be obtained by using a seed >0. (See “SAS User’s Guide: 
Basics, Version 5 Edition”, page 236.) 

VAR= the name of the random variate for which values will be generated. The default 


value ts X. 


2.5.3 %GENNORM Macro 


%GENNORM generates observations of a Normal random variate. This macro uses the 
RANNOR function as follows: 


variate = mu + SQRT(sigsq) * RANNOR(seed); 


where variate, mu, sigsq and seed are macro parameters VAR, MU, SIGSQ and S, respectively. 


rie cee 


General form: 


%GENNORM( parameter list) 


Parameters: 
MU= the mean. The default value is 0. 
= the number of observations to be generated. The default value is 50. 
= the initial seed for the random number function. The default value is 0 which 
causes a CPU clock observation to be used as the initial seed. A reproducible 
series of values can be obtained by using a seed >0. (See “SAS User’s Guide: 
Basics, Version 5 Edition”, page 236.) 
SIGSQ= the variance of the distribution. The default value is 1. 
VAR the name of the random variate for which values will be generated. The default 


value is X. 


2.5.4 %GENT Macro 


%GENT generates observations of a t variate. This macro uses the RANNOR and 
RANGAM functions as follows: 


ALPHA = degrees-of-freedom / 23 


R1 = RANNOR( seed); /*% Normal(0,1) ¥/ 
R2 = 2 * RANGAMC(Cseed,ALPHA) ; /* Chi-square */ 
variate = Ri 7/7 SQRT(R2 /7 degrees-of-freedom) ; 7% t %/ 


where degrees-of-freedom, seed and variate are macro parameters DI‘, S and VAR, respective- 
ly. (Note that each function call returns a new value for seed. thereby ensuring the indepen- 
dence of the Normal and chi-square random variates.) 


General form: 


%GENTCparameter list) 


Parameters: 

DF= degrees of freedom. The default value is |. Any value specified must be an 
integer. 

N= the number of observations to be generated. The default value is 50. 


the initial seed for the random number function. The default value is 0 which 
causes a CPU clock observation to be used as the initial seed. A reproducible 
series of values can be obtained by using a sced >0. (Sce “SAS User’s Guide: 
Basics, Version 5 Edition”, page 236.) 


= pyal 


VAR= the name of the random variate for which values will be generated. The default 
value is X. 
2.5.5 %GENUNI Macro 


%GENUNI generates observations of a uniform random variate on the interval (0,1). This 
macro uses the RANUNI function as follows: 


variate = RANUNI (seed); 
where variate and seed are macro parameters VAR and S, respectively. 
General form: 


%GENUNI (parameter list) 


Parameters: 

N= . the number of observations to be gencrated. The default value is SO. 

S= the initial seed for the random number function. The default value is 0 which 
causes a CPU clock observation to be used as the initial seed. A reproducible 
series of values can be obtained by using a seed > 0. (See “SAS User’s Guide: 
Basics, Version 5 Edition”, page 236.) 

VAR= the name of the random variate for which values will be generated. The default 


value is X. 
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Section 3 
USER INTERFACE 


The libraries containing the macros and functions described by this document are made avail- 
able automatically to users of the RGS-supported (STC2.SAS) catalogued procedures and 
clists. (Originally, it was necessary for a user to establish access to the libraries by means of 
parameters to the procedures and clists, but this requirement has been eliminated prior to dis- 
tribution of this document.) 


Some minor operational requirements which remain pertinent are described below. 


3.1 Batch Mode 


When you intend to use a macro or macro option which will generate a SAS/GRAPH plot, 
you must invoke the SASG3800 catalogued procedure. This procedure allocates the files nec- 
essary for graphics output to IBM laser printers via interface with GDDM3? software, and allo- 
cates a default virtual storage region adequate for most graphics applications which use that 
interface. Symbolic parameters DEST and GCOPIES can be used to route graphs to a remote 
38xx printer (where applicable), and to generate multiple copies of graphs, respectively. 
(GCOPIES should be used with discretion because graphics images are retransmitted to the 
printer for each copy.) 


3.2 Interactive Mode 


The default TSO logon region is adequate for most SAS sessions that do not use 
SAS/GRAPH. However, if graphics are to be produced, the logon region should be set to 
3000K as in the following example. 


TSO userid ACaccount) S(3000) 
When you intend to use a macro or macro option which will generate a SAS/GRAPH 
plot, you must specify the GRAPH38 parameter when you invoke the SAS clist. This is done 


subsequent to issuing the START SAS command and allocating required personal data sets. 
The following example illustrates the command sequence. 


3 1BM’s Graphical Data Display Manager 


wo Or 


START SAS 

ALLOC F(filename) DA('dsname') 

SAS GRAPH38 
Parameters DEST and GCOPIES can be used with the SAS clist to route graphs to a remote 
38xx printer (where applicable), and to generate multiple copies of graphs, respectively. 
(Detailed TSO help information is available for the SAS clist upon return from the START 


SAS command. GCOPIES should be used with discretion because graphics images are retran- 
smitted to the printer for each copy.) 
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Appendix A 
FORMULAS FOR PROBABILITY DENSITY FUNCTIONS 
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EXAMPLES OF GRAPHS * 
1. Histogram of standardized data and superimposed Normal(0,1) PDF, 
produced by ZHIST Macro. 


2. ECDF of standardized data and superimposed Normal(0,1) CDF, 
produced by ZECDF Macro. 


3. Q-Q plot comparing unstandardized data to Normal(0,1) distribution, 
produced by ZQQ Macro. 


4. P-P plot comparing standardized data to Normal(0,1) distribution, 
produced by $PP Macro, 


* The data - observations of systolic blood pressure for 5802 Canadians - 
are from Statistics Canada's 1978/79 Canada Health Survey. 
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